Part 2 (Final) Flashcards
What is a latent variable
latent variables are variables that are not directly observed but are rather inferred (through a mathematical model) from other variables
What are two things we are expected to measure in an observation
true score: the real/expected influences on our measurements
ideally, more true score than error. However, often not the case
error: undefined/unexpected influences on our measurements
Why do we want enough questions in our questionnaire when it comes to error
if we have enough questions measuring the construct well, we can overcome error by cancelling out the randomness of overestimation vs underestimation
How can you reduce error
Combining measures
like layering all pictures that had only 10% of the information we can get a clearer idea of true score
What is reliability in the context of a questionnaire
In a world where our measurements are all error-prone, identifying patterns across multiple responses becomes critical. This is consistency
i.e. we need the latent variable to be influencing all responses to at least some extent producing a common pattern in the responses
the influence of the latent variable is the true score
any other influence is error
What is the classical measurement model
The classical measurement model has 3 key assumptions
remember: an assumption is something we expect to be true
- The individual items of a questionnaire each have error and true score. The amount of error in any given item varies randomly. The mean error across items is zero (given a sufficient N)
- The error in one item is not correlated with the error in any other
- The error in the items is not correlated with the true score
What is the parallel test model
Extends the classical measurement model with 2 more assumptions to be more practical
- The latent variable influences all items equally
all item-construct correlations are the same
2.Each items has the same quantity of random error
the combined influences of all other factors are the same
Each item is true score + error, so if you reduce the amount of true score (latent variable influence) then you would increase the amount of error
Name the 5 assumptions of the parallel test model
- Only random errors
- Errors are not correlated with each other
- Errors are not correlated with the true score
- The latent variable affects all items equally
- The amount of random error for each item is equal
What is the essentially tau-equivalent model
to avoid violating our assumptions it’s helpful to loosen them a bit
- Only random errors
- Errors are not correlated with each other
- Errors are not correlated with true score
- The latent variable affects all items equally only when standardized
differences are due to constants
e.g. all questions are turned into z-scores, we might have different response formats across our questions. If we do not convert them into z-scores, it might not look like they are all affected equally - The amount of random error for each item is not necessarily equal
partly a consequence of not standardizing
If you are using a mix of Likert-type, analog and dichotomous questions. Which model is better for you?
Essentially Tau-Equivalent Model because less likely to violate our assumptions
What is the congeneric model
The congeneric model is much less strict
- Random error is only preferred, but not necessary
- Errors are preferably not correlated with each other, but can be
- Errors are not correlated with the true score
- The latent variable affects all items in some way
- The amount of random error for each item is not necessarily equal
Compare the models in strictness
- Starting points: classical measurement model + parallel test
- Common & somewhat strict: essentially tau-equivalent + congeneric
- Much less strict: general factor (allows for multiple latent variables in each measure)
What are additional assumptions that correlation analysis will add to our model
- you have interval-level data
probably not true, usually ordinal even if close to interval - your data follow a normal distribution
not too well, without interval data - a straight line is the best way to represent the relation
this is probably true
Explain the assumption of linearity
If the line of best fit should be u-shaped, you have a big problem
A straight line fit to these data would be flat indicating no relation is present
luckily, for reliability, we’re talking about measuring the same construct across two different questions, so it’s quite unlikely for their relation not to follow a straight line
Name the two deviations from normality
Skewness: the presence of a longer than normal tail
Kurtosis: the presence of a taller or wider than normal spread
What does deviation from normality threatens
there are two main categories of deviation from normal (skewness and kurtosis), and they both threaten the validity of all these models
Describe skewness
Either the left or the right tail could be pulled out
Skewness means you distribution has asymmetry
A negative skewness means the left tail is long
A positive skewness means the right tail is long
Describe kurtosis
The peak can be pulled up or pushed down
There are two kinds of kurtosis as well
a negative kurtosis means the curve is falter
a positive kurtosis means the curve is taller
Assumes symetrical distribution, something is wrong on both sides of the distribution
How do we assess whether kurtosis and skewness are too much
We assume a normal distribution, but don’t typically get it in practice
there will be some degree of skew
there will be some degree of kurtosis
If either score are +/- 3, then it indicates it might be too much of a problem
OR
Multiply the standard error (SE) by 3 and if the skewness and kurtosis are bigger than the number, then it’s too much
For that to make sense, you need a fairly small sample size, since as n increase, SE decrease, but overall assumptions are better
How can large error be problematic for reliability
extremely large amounts of error will prevent you from observing any interesting associations (because it’s random)
How do we determine whether we have too much error?
We estimate it by correlating a measure with itself
Internal consistency, whatever is not true score must be error
Name the 5 form of internal consistency measures
- Cronbach’s alpha
- Split-half
- Test-retest
- Alternate Forms
- Omega
How common is Cronbach’s alpha
Most commonly reported
Easy to use with jamovi (and SPSS)
Briefly describe Split-half
Less common than alpha
Easy to use too (but only manually)
Usually, split by odd and even numbers. Correlate points from even questions to odd questions
Briefly describe Test-retest
Optimal, when possible
Require twice the resources
only really good for traits, not so good for states
Briefly describe Alternate forms
Even less common
Requires two identical measures
make two different versions of the questionnaire that measure exactly the same thing. Alternate who gets A and B, and then assess the second version later and the higher the correlation, the more reliable
Briefly describe omega
Newest form
No tau or error equivalent is required (true score + error)
based on congeneric model (loose model requiring a single latent variable)
harder to violate it’s assumption, but harder to calculate, which is fine with jamovi
more than one calculation for omega
Describe the key assumptions of Cronbach’s alpha
Based on Essential Tau-equivalence model, depending on how it’s calculated, so is built on several key assumptions
- There is a single latent variable
- Error is always radom
if not true, will affect how we believe our questionnaire to be reliable - True score influences items equivalently
- Inter-item correlations are equivalent
- All inter-items correlations would be equal in a large enough sample
in an infinitely large sample - Items would have equal variability in a large enough sample
- More items will estimate true score and error better
How do you interpret a chronbach’s alpha
Like correlation coefficient that it is between -1 to 1, but you do not want to see negative
For research purposes, a good alpha is a = .70 or better
For other purposes, a good alpha is a = .80 r better (and it may need to be even higher for diagnosis)
We won’t hit 1 since we are going to violate the assumption to some extent
Can be thought of as the proportion of true score that made its way through the measure
if a = 0.8 → 80% true score is captured by the measure
how can you reach a large alpha score
Either high inter-item correlation OR large correction (large sample size)
Alpha tends to increase when you add more items of equivalent value
What will likely happen if you make your questionnaire too long
while it might seem like good reliability can be achieved simply by making very long questionnaires, this isn’t really true
Remember, when we make our questionnaires too long, people lose interest in our questions and that changes the average inter-item correlation
as k increase, rbar most likely will decrease
What is a common mistake with reliability measures
You can never forget that reliability is NOT a property of a measure itself
How much true score is captured by a measure depends on the sample of people who completed it
some samples may interpret items differently, affecting error
It’s not sufficient to simply quote a previously published reliability statistic - you can start from there, but verify for your sample
generational understanding of the wording can affect you sample, people change across time thus affecting reliability
How can you resolve problems with a low alpha
When alpha is lower than we would like, or even if it’s not, you can consider improving it by looking at whether to drop bad items
Whenever an item is not really ‘equivalent’ to the others, as alpha assumes, then dropping it will change alpha
if it has a low inter-item correlation, alpha goes up
if it has a high inter-item correlation, alpha goes down
But a questionnaire with more “bad” items can still be better overall than one with a small number of ‘good’ items
Why is omega a better alternative than alpha
McDonald’s Omega is a better choice in almost every case (possibly all) and should be the go-to reliability statistic from now on
Base on the congeneric model; easier to meet restrictions
- interpretation is the same as for alpha (easy replacement)
- not all items need to be positively correlated (but still a good idea)
- not all correlations need to be equivalent
- when none of alpha’s assumptions are violated, it gives the exact same results; when they are, it is less of an underestimation
What is validity
the state of being valid; the degree to which we are measuring whatever it was that we wanted to measure
like reliability, this isn’t simply a property of the measure itself
for a measure to have validity, we must consider whether it’s being used in the correct context
must be sure we are using it with the correct type of people
What are the three main form of validity we should assess
Content validity: key issue is deciding what should be included
Criterion validity: key issue is deciding whether the measure ‘works’
usually applied in a predictive sense, can I predict the outcome
Construct validity: key issue is complex
Content and construct validity are always important, criterion validity only in certain context
What is content validity
This form is easy to understand, but hard to be sure you’ve achieved it
to be sure you’ve represented the full range of possible content, you need to have an excellent understanding of what needs to be asked
what are all the key thoughts, behaviours, skills, etc.
have you represented their relative importance properly?
how much of each aspect goes into the total score for the measure?
Example: the midterms for this course try to address the full range of important topics discussed
How can you assess content validity
Successful representation of all the critical aspects of the construct is usually determined by two things
adherence with prior literature review
expert review of your measure
What is criterion validity
This form requires you to evaluate how well scores on your measure watch with an accepted ‘gold standard’ or tangible outcome assessment.
For example
delinquency assessment: correlate your measure with the number of offences committed
job suitability: correlate your measure with ratings of job performance
graduate records exam: correlate with grad school
GPA
What is construct validity
Two separate definitions
An overall category that encompassess all other forms of validity
Assessing the relation of your measure with other theoretically-relevant measures
What are the two sub categories of construct validity
convergent validity: do you have expected relations to theoretically related constructs?
discriminant validity: do you have the expected lack of relation to theoretically unrelated constructs?
What is convergent validity
To properly establish convergent validity, you need to be sure the related constructs do not themselves contain facets of the construct you’re measuring.
For example:
Mindfulness should show a positive correlation with the Openness to Experience sub construct of the Big5 personality measure a a negative correlation with neuroticism
ADHD should show positive correlations with Memory Failures, and negative correlations with Mindfulness
convergent means a relationship: either positive or negative while discriminant should show NO relationship
What is discriminant validity
to properly establish discriminant validity, you need to have good reasons for choosing the potentially unrelated (or very weakly related) constructs, For example
Intelligence is only weakly related to Artistic Ability - this is a theoretically relevant, not random choice
Memory Failures show no relation to Internal Locus of Control - this could be important as both LOC and MF are typically related to Depression
What is face validity
The overlooked middle child (by psychometricians) of validity addresses the question of whether the measure seems to measure the right thing
It is critical to at least consider whether there are other reasonable interpretations
Outside of psychometrics, this form is frequently used as it’s the easiest to claim and not based on correlation
The basic idea is: any reasonable person would say you measured the right thing
What is internal validity
Strongly related to the specific ways of establishing validity. It’s simply the extent to which you can account for other plausible explanations
You need to either rule out alternatives, or somehow argue they’re not plausible after all
Error, particularly biased responding, will again be concern as it increases the likelihood of threats to internal validity
Name 6 types of systematic error
- order effects: practice, fatigue, boredom, context effects, etc.
- motivation: we could have incomplete data, change context effects
can lead a to more response set - distraction: similar issues to a lack of motivation
- sampling bias: when we haven’t sampled the right people
- maturation: mainly problematic for predictive criterion validity
- instrumentation: when our questions get less useful over time
What does systematic error affect
some of the common sources of systematic error that affect only validity
Reliability is concerned with random error, while validity is also concerned with systematic error
How should we mesure validity
Aside from face validity, we use correlations between our measure and another measure to quantify and demonstrate validity
We need to anticipate more modest correlations for validity than we would for reliability, however
the expected size should be based on prior literature and theory
an r up to .10 is good for showing discriminant validity
an r of about .60 is very good for showing convergent validity. between .2 and .7 are usually cutoffs
an r of about .85 or higher is usually very bad for showing convergent validity, too close to 1 and would be hard to say they are actually two different concepts
If systemic error increases, what happens to validity measures
convergent validity
you are likely to over- or underestimate the correlation
discriminant validity
you are likely to get a higher correlation than you would want
What are major flaws of Brown & Ryan’s questionnaire
all of the items are reverse scored
The MAAS items capture only one part of the full construct of mindfulness
In practice how much error should expect
More often than not, we have much less true score than error
What does it mean to have “enough signal to noise”
Ideally, you need enough true score (signal) to overcome the randomness of error (noise)
What is another name for the Essentially Tau-equivalent Model
also called essentially equivalent to true score model
How is alpha calculated
there are many ways to calculte it. Often uses essentially tau-equivalent model, but jamovi uses a mix with parallel test model
In a negatively and positively skewed distribution where can you find the measured of central tendency
from left to right
Negatively skewed: mean median mode
Positively skewed: mode, median, mean
Why is it bad to have too much skewness
High levels of skew is a source of error that will impact the accuracy of our correlations
What is a caveat of the kurtosis measurement
It’s possible to calculate a big positive or negative kurtosis measure without a normally shaped distribution
What is a platykurtic distribution
flat distribution
What is a leptokurtic distribution
too skinny distribution
The different threats of systematic error are likely to compromise which type of validity
internal validity