Part 3 (Final) Flashcards

Question

What is the varimax rotation for EFA

Answer 1

tried to maximize the differences between your clusters by rotating the factor axes This is called varimax or quartimax rotation in jamovi Varimax rotation is very commonly used. It makes it easier to identify the differences between clusters of items (or which items best represent which cluster)

Answer 2

Tries to maintain some of the association between your clusters (and measure it) while rotating the factor axes This is called oblimin or promax rotation in jamovi Oblimin rotation is used to minimize the squared loading covariance of the factors, while allowing them to be correlated as different as possible while still being related This oblimin rotation is, obviously, not an orthogonal rotation This means the horizontal and vertical axis (for 2 factors) aren’t at 90 degrees The strength of the correlation dictates the extent of the departure from a 90-degree angle The results obtained in jamovi would look the same as with a varimax rotation, or even an unrotated solution, but it will have different factor loadings

Answer 3

Multiple sources of variance In Exploratory Factor Analysis we expect the latent variable(s) to be only part of the forces acting on our measurements. The other influences are probably just random error, though. That means the eigenvalues we calculate are affected by our conception of how to attribute the sources of variance

Answer 4

In Principle Component Analysis we expect all variance to come from common sources among the items, through not necessarily to an equal extent in each This means there is no choice of extraction method, as these options under EFA are based on different ideas of the sources of variance It’s quite possible for some of the common variance to still be error, though; accordingly, we still want to discard any ‘factors’ that seem too trivial - on the basis of their eigenvalues

Answer 5

Technically, PCA is not considered factor analysis. You should use it if you want to maximize the amount of explained variance If someone asks you to do a factor analysis, you should do EFA

Answer 6

Exploratory Factor Analysis (EFA) allows very little control over the latent variables, theories are introduced through rotation or forcing a set number of factors Confirmatory Factor Analysis (CFA) requires you to exercise control over which items theoretically go with which latent variable; with the right software, not jamovi (yet), you could even specify expectations for how strong the relations should be

Answer 7

If you go back and look at the process for EFA, it has a very similar structure The main difference here is that we need an a priori theory and we need to check that theory The examination of the model fit dictates whether we are done or need to revise our model or our theory

Answer 8

use standardized estimates, preferable, recognizing that each of the factor loadings should be significant (p < .05) examine the chi squared as well as the CFL, TLI & RMSEA statistics you want the chi squared to be non-significant, though that’s unlikely you want the CFI & TLI to be above .95 (below .90 is very bad) you want the RMSEA to be below .05 (not more than .08 upper CI)

Answer 9

1. turning into z-scores 2. norming 3. an expected set of procedures, or that the items on a questionnaire have been previously established and we didn’t change them, that a measure has established normative information available, or some combination of all these. Unstandardized means one or more of these is missing

Answer 10

Another big difference between many standardized and unstandardized ones are proprietary. That is, you have to pay to use them. Standardized: Controlled administration conditions, normative information available, interpretation guidelines available, instruction manual available, carefully vetted (?), and probably costly Unstandardized: May not require controlled conditions, typically lacks normative information, less interpretable for giving feedback, mainly useful for research purposes

Answer 11

for measuring personality, free tests are better: more reliable and efficient

Answer 12

You may recall I mentioned that it’s possible to theorize and (with different software) test multiple levels of latent variables. There’s a name for this. Hierarchical Factor Analysis is one where you have proposed one or more latent variable(s) causing changes in one or more other latent variable(s) The idea here is you have a primary latent variable that influences the other latent variables

Answer 13

We’re often focused on measuring just one factor, but this isn’t always what we want. A test for a course may not capture just one construct An IQ test captures G (overall intelligence), but it also has multiple components within it A personality “test” assesses many aspects of personality; the Big 5: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism they aren’t really tests, they are questionnaire

Answer 14

above .3, no more than .8

Answer 15

In experimental research we are usually trying to manipulate something, and to observe the effect of that manipulation on an outcome. Many studies use performance on some task as their outcome Often, they just compare two different groups of people as the “manipulation” Very rarely are psychometrics evaluated in these contexts

Answer 16

If we don't, we may: Making less-than-optimal design decisions (e.g., power, task selected) Failure to find predicted results Incorrectly interpreting results

Answer 17

Discrimination: Scores aren’t entirely random (e.g., high/low performance by some individuals) similar to good distribution and range from questionnaire, can be thought of as criterion validity Reliability: Scores are a precise estimate of one’s ability Validity: Scores reflect the right cognitive faculty

Answer 18

The control group may not have the same reliability, because of response biases Assuming reliability would be equal could lead to incorrect conclusions

Answer 19

For tests, items have right and wrong answers, whereas for questionnaires they do not (except with respect to the truth of the response for that individual). validity, reliability but also suitability of purpose is important for tests We are still interested in psychometric qualities for tests, but not all concepts apply equally well: Internal consistency reliability may not make much sense This depends on how many factors we expect to capture using the test, and how strongly correlated the factors are

Answer 20

For proper tests, there are two important statistics to consider – in addition to reliability and validity. Discrimination: The ability to accurately separate individuals into high/low performers Discrimination is based on a relatively simple formula, especially compared with everything else we’ve seen. Discrimination = Ph/Nh - Pl/Nl Difficulty: The likelihood of getting a question correct ``` Ph = People responding correctly (H) Pl = People responding correctly (L) Nh = Number of people in group (H) Nl = Number of people in group (L) ```

Answer 21

Remember, the goal of discrimination is good categorization of individual cases into high and low performers. That means we ideally have approximately half of people getting items wrong, and we want to be able to predict in which half an individual falls with relatively good accuracy – for each item So, there is typically a connection between discrimination ability and item difficulty

Answer 22

The item’s discrimination score should be interpreted in light of the item’s difficulty score. Difficulty is 1 minus the proportion of correct responses from all individuals Difficulty = 1 - Correct/N

Answer 23

If you are designing a test for research purposes, you want to maximize the variability in the scores (think, full range of possible responses). That means you should aim for most items to have a difficulty score near the optimal .50 This gives you equal room above and below, across participants With optimal difficulty, you have optimized the probability of getting a high discrimination score – though it is by no means certain This means your item can separate good performers from poor performers

Answer 24

People are naturally inclined to want to do “well” on their tests. For research, this may or may not be a key concern – it depends on the test. Getting just over 50% on a test doesn’t fit most peoples’ definition of “doing well”, even if it is above average Pretty much only psychometricians would think this way So, sometimes you need to sacrifice some psychometric quality to accommodate psychological considerations Considering a test as a whole, there may be utility in having some very difficult items even if that means they have poor discrimination (this would depend on you choice of split, of course)

Answer 25

If it’s an option, standardization can help resolve the contrast between good psychometrics and psychological considerations. In this case, standardization means taking the raw scores, which are likely very low, and converting them into a new score that looks more acceptable Percentiles would be one way of doing it IQ scores typically use standardization via Z scores, where a Z of 0 becomes 100 (a psychologically pleasing number) and each Z difference of 1 adds or subtracts 15 points

Answer 26

Run one reliability analysis per factor (e.g. 2 factors = 2 reliability analysis)

Answer 27

For PCA, covariance matrix, not correlation matrix and there is no real definition of random error PCA tend to have higher correlations because what EFA would attribute to error (unexplained variance), PCA will attribute to covariance (explained variance)

Answer 28

Factor analyses can account for error and use "true score" better, allowing us to have more precise models

Answer 29

We need good theories We need to capture all our dimensions well We need reliability We need convergent validity (and ideally also discriminant validity)

Answer 30

There are 280 different measures of depression. The most cited scale for depression is the CES-D The CES-D has 20 items and 1/3 of them do not appear in any other commonly used measure of depression Therefore, all 279 other research teams did not agree that a third of the questions should be measuring the construct of depression this could led us to believe this scale measures depression and another component Could lead to p-hacking, where one chooses the scale of depression that fits their hypothesis and reject their ones that don’t to demonstrate significance

Answer 31

The situation isn’t any better when looking at reliability for depression. Inter-rater reliability of major depression diagnoses was .28 (presumably this is an r) Cronbach’s alpha has many limitations, but in most cases is the only statistic ever used to demonstrate reliability At least 20% of studies don’t even report alpha Some report other people’s alpha, not in relation to their own sample Disconnect between construct validity and reliability, while they should go together

Answer 32

Adding or removing items is a common practice, without providing a clear reason for doing so Changing items isn’t necessarily a bad thing; the lack of justification is Many studies don’t try to demonstrate validity, and many that do simply provide a citation to some other study

Part 3 (Final) Flashcards

(56 cards)