Data Reduction: Exploratory Factor Analysis (EFA) Flashcards

Question

Estimating EFA Estimating communalities Difficulties

Answer 1

Estimating communalities is hard as population communalities are unknown - they range from 0 (no shared variance) to 1 (all variance is shared) - Occasionally estimates will be >1 (Heywood cases) - methods of estimation are often iterative and 'mechanical'

Answer 2

this approach uses SMC to determine the stability of the values on the diagonal of our correlation matrix 1) compare initial communalities from SMC 2)Eigen decomposition = once we have these reasonable lower bounds, we substitute the 1s in the diagonal of our correlation matrix, with SMCs from step 1 3) obtain the factor loadings using eigen values and eigen vectors of the matrix obtained in step 2 some versions of PAF use an iterative process where they replace the diagonal with the communalities obtained in step 3, then do step 3 again, then replace the diagonal again etc.

Answer 3

this is an iterative approach and the default of the FA procedure it tries to minimise communalities on the diagonal 1) starts with some other solution e.g. PCA or principal axes, extracting a set number of factors 2) adjust the loadings of all factors on each variable so as to minimize the residual correlations for that variable

Answer 4

this is the best estimation method BUT it doesn't always work (the other two will work no matter how bad your data is) the procedure works to find values for these parameters that maximize the likelihood of obtaining the observed covariance matrix

Answer 5

* provides numerous 'fit' statistics that you can use to evaluate how good your model is compared to other data * MLE assumes a distribution for your data (e.g. normal distribution)

Answer 6

* it is sometimes not possible to find values for factor loadings that equal MLE estimates - this is referred to as non-convergence * MLE may produce impossible values of factor loadings (e.g. Heywood cases) or factor correlations (e.g. >1) *MLE assumes data is continuous and this is not always the case

Answer 7

We use the same methods mentioned in PCA - variance explained ... - skree plots - MAP - parallel analysis Use all of these to decide a plausible number of factors - use MAP as the minimum - use parallel analysis as the maximum

Answer 8

- the pattern of factor loadings is not always clear - the difference between primary and cross loadings can be small

Answer 9

= it means that there are an infinite number of pairs of factor loadings and factor score matrices which will fit the data equally well and are thus indistinguishable by any numeric criteria in other words - there is no one unique solution to the factor problem this is why the theoretical coherence of a model plays a bigger role in EFA than PCA

Answer 10

Rotation aims to maximise the relationship of a measured item with a factor = make primary loadings big and cross loadings small original correlations are very noisy and difficult to find patterns so we rotate to simplify although we can't tell the methods apart numerically, we can select the rotation with the most coherent solution

Answer 11

All factor rotations seek to optimize one or more aspects of simple structure: 1) each variable (row) should have at least one 0 loading 2) each factor (column) should have the same number of 0s as there are factors 3) every pair of factors (columns) should have several variables which load on one factor but not the other 4) when >4 factors are extracted, each pair of factors should have a large proportion of variables that do not load on either factor 5) every pair of factors should have a few variables that load on both factors

Answer 12

Correlations between factors are 0 Axes are at right angles Includes varmax and quartimax rotations

Answer 13

This method is most recommended Correlations between factors are NOT 0 this is useful as it is more like reality and as this whole thing is exploratory there is no need for this constraint Axes are NOT at right angles Includes promax and oblim rotations

Answer 14

pattern matrix = matrix of regression weights (loadings) from factors to variables

Answer 15

structure matrix = matrix of correlations between factors and variables Structure matrix = pattern matrix multiplied by factor correlations (in orthogonal rotations, structure and pattern matrices are the same)

Answer 16

start by examining how much variance each factor accounts for and the total amount of variance we evaluate factors based on the size and sign (+/-) of the loadings that you deem to be salient ( generally loadings >0.3)

Answer 17

REMEMBER - if you delete any items you must re-run FA starting from when you figure out how many factors to extract *Heywood cases = items with loadings >1 → this means that something is wrong and you should not trust these results * items with no salient loadings? = could be a signal of a problem which should be removed = could be a signal of an additional factor * items with multiple salient loadings? (cross-loadings) = indicated by item complexity values *do any factors load on <3 items? = 3 should be minimum = may have over extracted = might have too few items

Answer 18

✅ all factors load on 3+ items at salient levels ✅ all items have at least one loading above the salient cut off ✅ No Heywood cases ✅ complex items are removed (in accordance with the research goals) ✅ solution accounts for an acceptable level of variance (given in the research goals) ✅ item content of factors is coherent and substantively meaningful

Answer 19

It is always good to test whether your study replicates well. This can be done by: - collecting data on another sample - splitting one large sample into two then we can test these as exploratory vs confirmatory data There are numerous methods for this

Answer 20

= correlations between vectors of factor loadings across samples "how similar are the loadings for M1 across the two samples?" "how similar are they for M2" Calculating congruence: 1) run factor model on sample 1 2) run factor model on sample 2 * ensure the same items are included and same number of factors specified 3) Calculate congruence

Answer 21

= measures similarity independent of the mean size of the loadings it is insensitive to a change in the sign of any pair of loadings Basics: <0.68 = terrible >0.9 = good >0.98 = excellent

Answer 22

This is the better solution In EFA all factors load on all items - these loading are purely data driven In CFA we specify a model and test how well it fits the data - we explicitly state which items relate to which factor - we can test if the loadings are the same in different samples / groups / across time etc

Answer 23

they provide variables to represent what we measured in EFA so we can test our constructs they use different pieces of information from the factor solution to compute a weighted score - the scores are a combinations of observations, factor loadings and factor correlations (method dependant)

Answer 24

This is the simplest approach to factor scoring = sum the raw scores on the observed variables which have primary loadings on each factor - which items to sum is a matter of defining what loadings are salient These require strict properties to be present in the data (but these are rarely tested)

Answer 25

this is the preferred method = focus on producing scores with correlations that match to the factor correlations

Answer 26

= includes a measurement component (CFA) and a structural component (regression) - doesn't require you to compute factor scores - requires good theory of measurement and structure if your constructs don't approximate simple structure you may have to turn to alternatives

Answer 27

in the past, rules were based around the participant to item (N:p) ratio BUT the crucial determinant is the communalities and items to factors (p:m) fewer participants needed if: - communalities are high and wide - p:m was high (e.g. 20:7 = interaction effect) general rule of psychology = more is better

Answer 28

garbage in = garbage out always check the quality of your data PCA and FA can not make bad data → good data

Answer 29

= to develop and use measurements of constructs to test psychological theories

Answer 30

describes data from any measure as a combination of: - the signal of the construct / the 'true score' - noise or 'error' the measure of other unintended things observed score = true score + error

Answer 31

if we assume of our test that: 1) it measures some ability or trait 2) in the world, there is a 'true' value or score for this test for each individual then the reliability of the test is a measure of how well it reflects the true score

Answer 32

under certain assumptions (parallelism, tau equivalence, congeneric tests) the correlations between two parallel tests of the same construct provided an estimate of reliability Parallel tests can come from several sources

Answer 33

= correlation between two variants of a test e.g. randomisation of stimuli, similar but not identical tests Alternative tests (should) have equal means and variances if the tests are perfectly reliable, then they should correlate perfectly ↳ since they won't - this deviation provides the measure of reliability alternatives can be expensive and time consuming - but they are becoming easier

Answer 34

= indicates how internally consistent a measure is 1) split the items (randomly) into a pair of equal subsets on n items 2) score the two subsets 3) correlate the scores With an increasing number of items, the number of random splits becomes increasingly large

Answer 35

the best known estimate for split half reliability is Cronbach's alpha - tells us "to what extent are observations consistent across items" BUT it does not indicate whether items measure one unidimensional construct cronbach's alpha increases as you increase the number of items this is represented by Spearman-Brown prophecy formula

Answer 36

Any item may measure: - a general factor that loads on all items - a group or specific factor that loads on a subset of items Given this, we can derive two internal consistency measures 1) Omega hierarchical (ωh) = the proportion of an item variance that is general 2) Omega total (ωt) = the total proportion of reliable item variance These are much more robust to the structure of your data and how it will work in the real world

Answer 37

= correlation between tests taken at (at least) two different points in time poses tricky questions: - what is the appropriate time between measures? - how stable should the construct be if we are to consider it a trait?

Answer 38

= do all the raters involved have consistent measures We can determine interrater reliability by means of intraclass correlation coefficients

Answer 39

this splits variance of ratings into multiple components: * variance between subjects (across targets) * variance within subjects (across raters, same targets) * variance due to raters (same rater, across targets)

Answer 40

it is useful to know how reliable our measure is for: - implications of validity - also allows us to correct for attenuation ( = estimates of effects are limited by reliability)

Answer 41

there are debates over the definition but basically: it determines whether a test really measures what it is supposed to measure debates over the definition lead to debates over what counts as evidence for validity

Answer 42

= a test should contain only content relevant to the intended construct = it should measure what it is intended to measure

Answer 43

= does the test 'appear' to measure what it was designed to measure?

Answer 44

= do the items measure a single intended construct this is the most important FA provides limited information towards this

Answer 45

= measure should have high correlations with other measures of the same construct

Answer 46

= measure should have low correlations with measures of different constructs

Answer 47

= measure should have expected pattern (+/-) correlations with different sets of constructs also some measures should vary depending on manipulations

Answer 48

= correlations with contemporaneous measures (tests done at the same time)

Answer 49

= related to expected future outcomes (longituinal)

Answer 50

not commonly considered in validation studies e.g. do tests of intelligence engage 'problem solving' behaviours

Answer 51

= should potential consequences of the test be considered part of the evidence for the test's validity Important measures for the use of tests: - is the measure systematically biased or is it fair for all groups of test takers? - does bias have social ramifications?

Answer 52

reliability = relation of true score to observed score validity = correlations with other measures plays a key role A score/measure can not correlate with anything more than it correlates with itself and therefore, reliability is the limit on validity

Data Reduction: Exploratory Factor Analysis (EFA) Flashcards

(76 cards)