Part 3 (Final) Flashcards
Name 4 areas where it is important to follow ethical guidelines in psychological assessment
test/questionnaire selection
scoring & interpretation
participant recruitment
treatment of participants
What does treatment of participant ethical guideline entail
Responsible treatment of participants
Even decisions made prior to seeing any participants are part of their ethical treatment
e.g. understand your measures, follow appropriate procedures, ensure participant wellbeing
What is included in informed consent
It includes things like estimating the time commitment and informing them about their right to withdraw, and your goals but not hypotheses.
Name some elements of ethical guideline “basics of wellbeing”
- informed consent
- assured confidentiality, as much as possible
- deliver feedback appropriately
What is included in confidentiality
this includes you practicing responsible handling of data
open-access data must be anonymous, people must be aware of the broad distribution of data, must plan in advance
What is included in deliver feedback appropriately
ideally, there should be some benefit to the participants
usually, we cannot give them any scores if we do not understand the measure already
What are some regulations around children in research
Children 14-18 participating in research must be towards a project that directly benefits them or children like them
Children under 14 must have parental approval, parents and children can withdraw at any time
Define Responsible Data Handling
No unauthorized access to your data, this includes the government
Safe retention of data
anonymity goes a long way here - don’t collect personally identifying information if it’s not needed
traditionally, keep for 5 years then destroy (not necessarily anymore)
In testing, ensure the results won’t be interpreted inappropriately
Define Responsible Data Removal
Data removal is allowed, but needs to be done responsibly as well
Balancing need to respect participants’ contribution with the need to reach accurate conclusions
Participants have the right to try to derail our studies, but not the right to succeed in doing so
Remember, our statistics have assumptions we should meet many statistics require complete data sets
What are the assumptions of the factor analysis
Based on general factor model, factor analysis shares some important assumptions with more classical measurement models
- Errors are random and not correlated with the latent variable
- Correlations among items exist because they share a common latent variable(s)
What is a factor
Factor is another way we could refer to a latent variable
What is a component
Component is another way we could refer to a latent variable
Why would we use the words “factor” or “component” instead of “latent variable”?
The main reason to use these terms instead of latent variable is to better acknowledge that the unobserved influence was derived empirically
How can you identify factors
Factor analysis is a process of trying to identify the latent variable(s) that influenced our measurements
Items that have stronger associations (correlations) with each other but weaker associations with other items will form identifiable clusters
we’re capitalizing on similarities and differences in correlations across items
if all items correlate strongly, there will be only one factor identified
What is the main element to have a good factor analysis
We need quite large data sets for factor analysis to give accurate results: larger than what we need for good internal consistency reliability or validity analyses
we should have 50% more data for factor analysis than for reliability/validity analyses (around 300-400) to have a meaningful factor analysis
What should you keep in mind as you run factor analyses (4)
- What you put into the analysis dictates what you get out of it
- Every item has the potential to create a factor, and influence the creation of other factors
- Adding or dropping even one item will change the outcome
- Factor analysis should be viewed as a process requiring many iterations, thus it is time consuming
When should a cluster be called a factor
Technically, we always find more than one ‘cluster’ or pattern of responses in a questionnaire - only the important ones get called factors
Researchers typically select as factors any components with an eigenvalue > 1.0
The eigenvalue is a measure of the amount of information captured by an item
An eigenvalue of 1 is generally seen as indicating the factor capture as much information as one typical (good) item
What is a parallel analysis
Parallel analysis created random data with the same number of variables and observations as your data. A correlation matrix is created with the random data and then eigenvalues are calculated
When the eigenvalues from the random data are larger than the eigenvalues for your real data, you know the variables in the factors are not correlated better than random noise
How do you interpret a scree plot based on eigenvalues against a parallel analysis
All the eigenvalues are plotted, and so are the stimulated eigenvalues
Here the simulated eigenvalues comes from the randomly generated Parallel Analysis data set
What jamovi will call a factor is any blue dot that is before the first time a yellow dot rises above the blue dots
Factors are just numbered sequentially, after sorting the eigenvalues largest-to-smallest
How do you interpret a traditional scree plot
The same eigenvalues are plotted, it’s just the comparison line that is different
Looking at the scree plot, where do you think we change mountain to ‘scree’
The number of factors identified to the left of where we think the screen starts is how many we should keep
What are some changes you can do to an exploratory analysis that will make a difference
changes worth considering: dropping one or more item(s), adding a new item, using rotation
What are some changes you can do to an exploratory analysis that will not make a difference
changes that won’t impact anything: removing participants with missing data, reverse-scoring
Briefly describe the exploratory factor analysis
Exploratory factor analysis is iterative, you repeat until a good solution arises
Identifying the number of factors is based on eigenvalue, using whatever method makes sense
Explain the different rotations for EFA
Rotation allows us to spread the variability among our factors more evenly
There are two main forms of rotation worth considering
Orthogonal: this maximizes the squared variance in the factor loadings - clusters are as different as possible, unrelated
Oblique: this maintains some relationship between the factors-clusters are different but related things
What is the varimax rotation for EFA
tried to maximize the differences between your clusters by rotating the factor axes
This is called varimax or quartimax rotation in jamovi
Varimax rotation is very commonly used. It makes it easier to identify the differences between clusters of items (or which items best represent which cluster)
What is the oblique rotation for EFA
Tries to maintain some of the association between your clusters (and measure it) while rotating the factor axes
This is called oblimin or promax rotation in jamovi
Oblimin rotation is used to minimize the squared loading covariance of the factors, while allowing them to be correlated
as different as possible while still being related
This oblimin rotation is, obviously, not an orthogonal rotation
This means the horizontal and vertical axis (for 2 factors) aren’t at 90 degrees
The strength of the correlation dictates the extent of the departure from a 90-degree angle
The results obtained in jamovi would look the same as with a varimax rotation, or even an unrotated solution, but it will have different factor loadings
Where do we expect the variance to come from for EFA
Multiple sources of variance
In Exploratory Factor Analysis we expect the latent variable(s) to be only part of the forces acting on our measurements. The other influences are probably just random error, though.
That means the eigenvalues we calculate are affected by our conception of how to attribute the sources of variance
What is the principle component analysis
In Principle Component Analysis we expect all variance to come from common sources among the items, through not necessarily to an equal extent in each
This means there is no choice of extraction method, as these options under EFA are based on different ideas of the sources of variance
It’s quite possible for some of the common variance to still be error, though; accordingly, we still want to discard any ‘factors’ that seem too trivial - on the basis of their eigenvalues
When should you use PCA or EFA
Technically, PCA is not considered factor analysis. You should use it if you want to maximize the amount of explained variance
If someone asks you to do a factor analysis, you should do EFA
Describe the EFA and the CFA
Exploratory Factor Analysis (EFA) allows very little control over the latent variables, theories are introduced through rotation or forcing a set number of factors
Confirmatory Factor Analysis (CFA) requires you to exercise control over which items theoretically go with which latent variable; with the right software, not jamovi (yet), you could even specify expectations for how strong the relations should be
What should be the process for CFA
If you go back and look at the process for EFA, it has a very similar structure
The main difference here is that we need an a priori theory and we need to check that theory
The examination of the model fit dictates whether we are done or need to revise our model or our theory
What statistics should you look at in a CFA
use standardized estimates, preferable, recognizing that each of the factor loadings should be significant (p < .05)
examine the chi squared as well as the CFL, TLI & RMSEA statistics
you want the chi squared to be non-significant, though that’s unlikely
you want the CFI & TLI to be above .95 (below .90 is very bad)
you want the RMSEA to be below .05 (not more than .08 upper CI)
Give the definitions for standardize (3)
- turning into z-scores
- norming
- an expected set of procedures, or that the items on a questionnaire have been previously established and we didn’t change them, that a measure has established normative information available, or some combination of all these. Unstandardized means one or more of these is missing
What is a proprietary measure, which are standardized vs unstandardized
Another big difference between many standardized and unstandardized ones are proprietary. That is, you have to pay to use them.
Standardized: Controlled administration conditions, normative information available, interpretation guidelines available, instruction manual available, carefully vetted (?), and probably costly
Unstandardized: May not require controlled conditions, typically lacks normative information, less interpretable for giving feedback, mainly useful for research purposes
Are proprietary measures of personality better?
for measuring personality, free tests are better: more reliable and efficient
What is a hierarchical factor analysis
You may recall I mentioned that it’s possible to theorize and (with different software) test multiple levels of latent variables. There’s a name for this.
Hierarchical Factor Analysis is one where you have proposed one or more latent variable(s) causing changes in one or more other latent variable(s)
The idea here is you have a primary latent variable that influences the other latent variables
Why use a hierarchical factor analysis
We’re often focused on measuring just one factor, but this isn’t always what we want.
A test for a course may not capture just one construct
An IQ test captures G (overall intelligence), but it also has multiple components within it
A personality “test” assesses many aspects of personality; the Big 5: Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
they aren’t really tests, they are questionnaire
What inter-item correlations should we expect between latent variables in a hierarchical factor analysis
above .3, no more than .8
Why use experiments
In experimental research we are usually trying to manipulate something, and to observe the effect of that manipulation on an outcome.
Many studies use performance on some task as their outcome
Often, they just compare two different groups of people as the “manipulation”
Very rarely are psychometrics evaluated in these contexts
Why should we evaluate the psychometric properties of experiments
If we don’t, we may:
Making less-than-optimal design decisions (e.g., power, task selected)
Failure to find predicted results
Incorrectly interpreting results
Define discrimination, reliability and validity in a cognitive task
Discrimination: Scores aren’t entirely random (e.g., high/low performance by some individuals)
similar to good distribution and range from questionnaire, can be thought of as criterion validity
Reliability: Scores are a precise estimate of one’s ability
Validity: Scores reflect the right cognitive faculty
What is the risk of assuming the same reliability for control and treatment groups
The control group may not have the same reliability, because of response biases
Assuming reliability would be equal could lead to incorrect conclusions
What is the difference between tests and questionnaires
For tests, items have right and wrong answers, whereas for questionnaires they do not (except with respect to the truth of the response for that individual).
validity, reliability but also suitability of purpose is important for tests
We are still interested in psychometric qualities for tests, but not all concepts apply equally well:
Internal consistency reliability may not make much sense
This depends on how many factors we expect to capture using the test, and how strongly correlated the factors are
Define discrimination and difficulty for tests
For proper tests, there are two important statistics to consider – in addition to reliability and validity.
Discrimination: The ability to accurately separate individuals into high/low performers
Discrimination is based on a relatively simple formula, especially compared with everything else we’ve seen.
Discrimination = Ph/Nh - Pl/Nl
Difficulty: The likelihood of getting a question correct
Ph = People responding correctly (H) Pl = People responding correctly (L) Nh = Number of people in group (H) Nl = Number of people in group (L)
What is the goal of discrimination in tests
Remember, the goal of discrimination is good categorization of individual cases into high and low performers.
That means we ideally have approximately half of people getting items wrong, and we want to be able to predict in which half an individual falls with relatively good accuracy – for each item
So, there is typically a connection between discrimination ability and item difficulty
How do you calculate difficulty in tests
The item’s discrimination score should be interpreted in light of the item’s difficulty score.
Difficulty is 1 minus the proportion of correct responses from all individuals
Difficulty = 1 - Correct/N
How would you optimize a test for research
If you are designing a test for research purposes, you want to maximize the variability in the scores (think, full range of possible responses).
That means you should aim for most items to have a difficulty score near the optimal .50
This gives you equal room above and below, across participants
With optimal difficulty, you have optimized the probability of getting a high discrimination score – though it is by no means certain
This means your item can separate good performers from poor performers
What are some psychological considerations when creating tests
People are naturally inclined to want to do “well” on their tests. For research, this may or may not be a key concern – it depends on the test.
Getting just over 50% on a test doesn’t fit most peoples’ definition of “doing well”, even if it is above average
Pretty much only psychometricians would think this way
So, sometimes you need to sacrifice some psychometric quality to accommodate psychological considerations
Considering a test as a whole, there may be utility in having some very difficult items even if that means they have poor discrimination (this would depend on you choice of split, of course)
How can you get around psychological considerations of tests
If it’s an option, standardization can help resolve the contrast between good psychometrics and psychological considerations.
In this case, standardization means taking the raw scores, which are likely very low, and converting them into a new score that looks more acceptable
Percentiles would be one way of doing it
IQ scores typically use standardization via Z scores, where a Z of 0 becomes 100 (a psychologically pleasing number) and each Z difference of 1 adds or subtracts 15 points
If you have multiple factors in your factor analyses, how should you assess reliability?
Run one reliability analysis per factor (e.g. 2 factors = 2 reliability analysis)
Contrast differences between PCA and EFA
For PCA, covariance matrix, not correlation matrix and there is no real definition of random error
PCA tend to have higher correlations because what EFA would attribute to error (unexplained variance), PCA will attribute to covariance (explained variance)
What is the advantage of doing factor analysis rather than just averaging out correlations
Factor analyses can account for error and use “true score” better, allowing us to have more precise models
What do we need for good measurements
We need good theories
We need to capture all our dimensions well
We need reliability
We need convergent validity (and ideally also discriminant validity)
Describe the issue of content validity with the depression scales
There are 280 different measures of depression.
The most cited scale for depression is the CES-D
The CES-D has 20 items and 1/3 of them do not appear in any other commonly used measure of depression
Therefore, all 279 other research teams did not agree that a third of the questions should be measuring the construct of depression
this could led us to believe this scale measures depression and another component
Could lead to p-hacking, where one chooses the scale of depression that fits their hypothesis and reject their ones that don’t to demonstrate significance
Describe the issue of reliability with the depression scales
The situation isn’t any better when looking at reliability for depression.
Inter-rater reliability of major depression diagnoses was .28 (presumably this is an r)
Cronbach’s alpha has many limitations, but in most cases is the only statistic ever used to demonstrate reliability
At least 20% of studies don’t even report alpha
Some report other people’s alpha, not in relation to their own sample
Disconnect between construct validity and reliability, while they should go together
How is removing items from a scale bad for validity
Adding or removing items is a common practice, without providing a clear reason for doing so
Changing items isn’t necessarily a bad thing; the lack of justification is
Many studies don’t try to demonstrate validity, and many that do simply provide a citation to some other study