Principal Component Analysis Flashcards

1
Q

Life is complicated

A
  • When we try to measure a single psychological phenomena there are many things that may be said to measure that particular process.
  • For example an attitudes questionnaire does not ask one question it asks many different one
    • Want the questionnaire to broadly measure the same thing
  • However all those questions that you ask have to tap into the same underlying construct.
  • It is not just questionnaires – behavioural measures can be explored in this way
  • Self report impulsivity
  • Delay discounting (impulsive decision making)
  • Go/No-Go task performance (disinhibition)
  • Stop signal task performance (response cancellation)
  • BART (risk taking)
  • Time estimation
  • All of these measures are said to tap into one construct but are they all the same? i.e. will an individuals performance be highly correlated across all tasks?
    Lots of studies have investigated the structure of impulsivity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

PCA

A
  • A PCA does allows us to take many items and reduce the dimensionality of a construct.
  • Can be used to clarify what we are using is measuring what we intend
    • Also useful when creating a questionnaires, how many factors it measures and what questions go into each factor
  • Eg Allison et al 2014 found that “Scripted responses”, “talking about unrelated topics” and “revealing well known information” were all highly associated with each other, therefore can be combined into a single measure verbal CITS
  • Conduct analyses on the factors- reduces the number of tests so reduces the chance of a type 1 error
  • Ergo we have reduced the number of dimensions to analyse three have gone into one.
  • Reduces likelihood of false positives because we test the one construct not all three separate measures
  • Essentially a PCA looks at which items correlate (R-Matrix) with each other and calls them the component/factor
  • Given that the PCA (and factor analysis) is based around a r matrix Pearson’s correlation.
  • What is an issue with it particularly given that it is used on Likert scales?
    • Ordinal level data so it is not parametric- analysis is based on Pearsons as this does not account for ordinal data
    • Only way to counteract is to have lots of participants
  • The PCA essentially finds underlying constructs from a larger data set.
  • It will also tell us how important each component is (i.e. what % of variance in the data set it accounts for)
    • How important are each of these contructs
  • This statistic is called an Eigenvalue (note it is one word)
  • Any PCA will find a very large number of components and you use the Eigenvalues to judge whether the components are worth keeping
  • Kaiser’s rule: Eigenvalues greater than 1 mean the component is valid
  • Joliffe (1972, 1986) Eigenvalues > .7 are valid in a PCA
    (no strict rule!)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kaisers Rule

A
  • Eigenvalue over 1- component is valid
  • Data below- 2 component solution as 2 values are above 1
  • Scree plots plot eigenvalues
    • When the graph flattens there’s no more components
  • Scree plot above shows a 2 component solution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

component loadings

A
  • The PCA will give us a number of components, but as you can see from the Eigenvalue table and the scree plot, this doesn’t tell us which of our individual measures make up each component.
  • Component loading tell us this, they tell you what the association is between each item and each component
  • Essentially they are a Pearson’s correlation between the item and the factor/component
  • The component matrix
  • Generally people take a component loading of .4 to be a strong enough loading
  • Sometimes people will actually calculate if it is significant as a loading by treating it as a correlation coefficient
  • You can therefore work out the p value if you know the sample size
  • BUT
    Often PCA/FA are done on large samples – what is the problem of simply working out if the item significantly loads onto a factor?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

rotation

A
  • Never have to explain this in detail- just need to know when certain rotations are used
  • To simplify data and the analysis; we apply a rotation.
  • Yaremko, et al. (1986), define rotation as:
  • “In factor or principal-component analysis, rotation of the factor axes(dimensions) identified in the initial extraction of factors, in order to obtain simple and interpretable factors.
  • This is because the components (i.e. the groups of variables that make them) are just mathematical outputs and directly linked to psychological phenomena in their raw form. This is due the axes being indiscriminate.
  • Two forms of rotation
  • Orthogonal rotation methods assume that the factors in the analysis are uncorrelated (Gorsuch 1983), most commonly used is the varimax rotation.
  • Oblique rotation methods assume that the factors are correlated, most commonly used is oblimin rotation
  • This is theoretically driven; you choose the rotation according to whether evidence suggests your components/factors will be correlated or not.
    The example in this lecture uses Varimax; Oblimin produces slightly different outputs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

assumptions

A
  • Variables should not be nominal (this requires a CATPCA), given what I said about factor loadings being Pearson’s correlations this should be somewhat obvious!
  • Sampling adequacy
    • Assesses whether or not PCA is appropriate for your data using Kaiser-Meyer-Olkin factor adequacy
  • Is PCA appropriate for the data?
  • KMO assesses how much variance among variables that may be common variance (i.e. explained by an underlying component or factor)
    • Kaiser-Meyer-Olkin measure of sampling adequacy should be above .5 to be acceptable (if less collect more data).
    • 0.5 - 0.7 = average
    • 0.7- 0.8 = good
    • 0.8- 0.9 = great
    • 0.9+ = excellent
    • (Hutcheson and Sofroniou 1999)
    • Others say a KMO below 0.7 is inadequate, for the purposes of this module we will use Hutcheson and Sofroniou’s cut offs (1999)
  • Sufficient correlations between individual variables to run a PCA
  • Remember the R Matrix?
    • Run a test to see if the r matrix is an identity matrix- make sure there is correlation between variables
  • We make sure that the R matrix is not an identity matrix
  • An identity matrix looks like this:
  • Doing a PCA on this is obviously pointless- everything is unrelated!!!!
  • This would give you a 6 factor solution (guaranteed!!!)
  • Bartlett’s test of sphericity
    • This tests the null hypothesis that the correlations represent an identity matrix, so we want this to be significant (p<.05)
  • Bartlett’s test of sphericity demonstrated that correlations between items were large enough for PCA (χ²(10) =152.96, p<.001).
  • Bartlett’s test of sphericity
    Although it is reported as good practice, personally I think Bartlett’s test is utterly useless and the only way you can fail to meet this assumption is to read in the wrong data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly