Lecture 2 + Wa Flashcards
What are observe scores, true scores and measurement error?
The scores on a test to measure a certain ability or characteristic
The actual levels of a certain ability or characteristic people have
The effect by random factors
(…) minus the error score equals (…) what should be on the dots (…)?
observed score and true score, respectively
Johns couch is 200 centimeters wide. He measures it with a measuring tape and finds it to be 205 centimeters.
What is the observed score, true score and error score here?
observed = 205, true = 200 and error = 5
What is the central idea in the classical test theory (ctt)?
Every test taker has a true score on a test
Why do observed scores not match true scores in practice?
Because measurement error exists and changes the observed score
What are two core assumptions of classical test theory?
- Observed scores are true scores plus measurement error, 𝑋𝑜 = 𝑋𝑡 + 𝑋𝑒
- Measurement error is random
you do not need to know the formulas, they function as a support for basic understanding
What follows from the two core assumptions of CTT?
- 𝑋𝑒 = 0 (mean of the measurement error is equal to 0) this is because error is random (nonzero = systematic)
- 𝑟𝑡𝑒 = 0 (the correlation between true score and error is equal to 0) I think because the true score contains no error (and also because the mean is 0 for all true scores?)
- observed score variance = true scores variance + error variance (note that variance is s^2)
Reliability can be defined in two ways, which?
Proportion of variance. This basically means that if there is a high proportion of noise (error) the reliability will be low, if there is a high proportion of signal (true score variance), the reliability will be high. Aka proportion/ratio of true score variance to observed score variance (which is also the formula)
Shared variance. Reliability defined as the correlation^2 btwn the observed and the true scores variance. A high correlation^2 equals a high reliability and vice versa
There are three types of tests regarding the methods of multiple tests, what are these and explain?
Split-halves = you have two tests bc you split one in half
Test-retest = retest the same test
Cronbach and Omega = each item counts as their own test
There are four models regarding the reliability of a test, which are these (do not explain)
Parallel, tau-equivalent, essentially tau-equivalent and congeneric test
What are the restrictions/implications of the parallel test model?
Restrictions = True mean scores of both tests are equal and error of both tests are equal
Implications = Mean of true scores and variance of true scores equal, as well as for the observed scores. Correlation between true scores =1 and reliability is equal
> also most restrictive
Which tests’ reliability are based on the parallel test model?
Split-halves and test-retest
What are the restrictions/implications of the tau-equivalent test model?
restriction = true mean scores equal across both tests
implication = mean and variance of true scores are equal, but only the mean of the observed score is equal. Correlation btwn the true scores is equal
What are the restrictions/implications of the Essential tau-equivalent test model?
restriction = mean true scores are not equal
implications = variance of true scores ARE equal and the correlation between the true scores = 1
Which tests’ reliability are based on the Essential tau-equivalent test model?
Cronbach’s alpha
Which tests’ reliability are based on the congeneric test model?
basically everything is different (except ofc like mean error score or smth) and correlation btwn true scores is still 1
> least restrictive
Which tests’ reliability are based on the congeneric test model?
Omega
What are three methods of reliability estimation (CTT)? Explain themmm
Alternate forms (parallel model, apply two versions of same test, correlation = reliability)
test-retest (parallel, same test twice, correlation = reliability)
internal consistency (parallel or essential tau, blocks of items = test, some complicated formula = reliability)
Challenges of alternate the three reliability estimations (CTT)?
Alternate = construction of test
test-retest = change in true scores
and above + internal consistency all have problems with carry over effects
Within internal consistency there are three further types (idk how to describe this), which are these/explain?
Split half (parallel, two halves, formula = reliability)
Cronbach’s (kr20 for binary, essential tau, each item is a test. formula)
omega (congeneric model or stricter, true score variance = factor analysis and reliability = true score variance/observed score variance)
Cotan guidelines about reliability?
High impact (individual) = Good: 0.9 or larger, sufficient btwn .8-.9 and insufficient <.8
Less impact (individual) = good >.8, suf btwn .7-.8 and ins <.7
Group = good >.7, suf btwn .6 and .7, ins <.6
What is the corrected iten-total correlation?
correlation btwn the item scores and the rest scores (item discrimination)
What is the item total correlation?
correlation btwn item scores and the sum scores (corrected one is better bc you correlate partly with itself)
Which factors affect reliability?
Test length (^ length = ^ reliability)
sample heterogeneity (heteregenous samples ^ reliability bc ^ variance)
correlation btwn pre and posttest (large correlation btwn pre and post ↓ reliability) probs bc it would mean simmilair scores?
Why is reliability being reliant on heterogeneity undesirable?
Reliability should be a property of the test, not the sample
What does a small measurement error mean for the reliability?
It is higher
What is attenuation?
reduction in effect
What is Cohen’s d and what is the relationship w/reliability?
An effect size (number of sds that the groups differ), a less reliable test = smaller cohen’s d
This also means the group difference is less likely to be significant w/ less reliability
What happens to correlation if reliability no good?
smaller correlation (again, less likely to be significant)
Consider two tests that purport to measure the same construct. In a pilot study, a researcher finds their observed test score means to be the same, but their test score variances to not be the same. Which of the test models do these data follow?
Tau-equivalent tests
Order the different test models on how restrictive they are from 1 (least restrictive) to 4 (most restrictive)
congeneric > essential tau > tau > parallel
In a hypothetical dataset that contains the test scores on two tests, the true score mean and true score variance differ across the two tests. Which test model does this dataset follow?
The congeneric test model
If you find the reliability of two tests measuring the same construct to be the same, what test model do these tests follow?
Parallel test model
In a hypothetical dataset that contains the test scores on two tests, the true score mean and true score variance are equal across the two tests. Which test model does this dataset follow?
parallel AND tau
Say you want to assess the consistency between the observed scores of one test and those of another test. Which method for estimating reliability do you use?
Alternate forms
Which of the following criteria do two test forms need to meet, in order to legitimately use the alternate forms method of estimating reliability?
The tests need to have identical true scores and identical error variance
Jimmy conducts a study into aggression, for which he uses the Aggression Questionnaire (AGQ; Buss & Perry, 1992). He wants to know how reliable the AGQ is. Therefore, he lets his respondents fill in the questionnaire again.
Which method of estimating reliability does Jimmy intend to use here?
Test-retest
What problem(s) may occur if you estimate reliability using the test-retest method?
Internal consistency
Indicate the properties of the three reliability measures below (raw alpha, standardized alpha, KR20)
Suitable for Likert scale items that do not differ in variance too much,
Suitable for Likert scale items that differ substantially in their item variance and suitable for binary items, respectively
What can be said about the difference between the raw alpha, standardized alpha and KR20 procedure?
Raw alpha and KR20 are based on the item covariances and item variances; standardized alpha only uses item correlations
How does consitency affect reliability?
^ consistency = ^ reliability
A researcher wants to assess the effectiveness of an assertiveness training. To this end she first measures assertiveness in a sample of randomly selected subject (pretest). Next she administers the training and measures assertiveness again (posttest). To draw a conclusion about the effectiveness of the training, she calculates differences scores (the pretest scores minus the posttest scores). The pretest has a reliability of 0.8, and the posttest has a reliability of 0.8. The correlation between the pretest and posttest is large (around 0.9). Which statement below is correct?
The reliability of the difference scores will be far below 0.8
> Notes: bc difference scores depend on the correlation btwn pre and posttest
If the pretest and posttest are both reliable, the reliability of the difference scores can still be relatively small if the pretest and posttest are…
… highly correlated
A researcher develops a test to diagnose learning disabilities in children. He finds the following value for Omega: .732
According to the COTAN guidelines, how would you assess the reliability of this test?
insufficient
Hank is doing research into job satisfaction. He wants to find out what the correlation is between work-life balance and working conditions. He uses two separate tests to measure these constructs. The work-life balance measure has an almost perfect reliability of 0.97. However, the working conditions measure has a quite low reliability of 0.39.
What can be said about the observed correlation between the two measures that Hank will probably find?
It will be largely attenuated, because one of the measures is quite poor, so even a very high true correlation will results in a moderate observed correlation
Which statistic do we use when we want to know the consistency between one item and the other items of a test?
Corrected item-total correlation